Original Article 




"LJT'P 


Healthc Inform Res. 2010 June;16(2):77-81. 




iiljA. 


doi: 1 0.4258/hIr.201 0.1 B.2.77 

pISSN 2093-3681 • elSSN 2093-369X 




Healthcare Informatics Research 



Diagnostic Analysis of Patients with Essential 
Hypertension Using Association Rule Mining 



A Mi Shin, RN, MS 1 , In Hee Lee, MS 1 , Gyeong Ho Lee, BS 1 , Hee Joon Park, PhD 1 , Hyung Seop Park, MD, 
MS 2 , Kyung II Yoon, PhD 1 , Jung Jeung Lee, MD, PhD 3 , Yoon Nyun Kim, MD, PhD 1,2 

Departments of 'Medical Informatics; internal Medicine; Preventive Medicine, School of Medicine, Keimyung University, Daegu, Korea 

Objectives: The purpose of this study was to analyze the records of patients diagnosed with essential hypertension using 
association rule mining (ARM). Methods: Patients with essential hypertension (ICD code, 110) were extracted from a hos- 
pital's data warehouse and a data mart constructed for analysis. Apriori modeling of the ARM method and web node in the 
Clementine 12.0 program were used to analyze patient data. Results: Patients diagnosed with essential hypertension totaled 
5,022 and the diagnostic data extracted from those patients numbered 53,994. As a result of the web node, essential hyperten- 
sion, non-insulin dependent diabetes mellitus (NIDDM), and cerebral infarction were shown to be associated. Based on the 
results of ARM, NIDDM (support, 35.15%; confidence, 100%) and cerebral infarction (support, 21.21%; confidence, 100%) 
were determined to be important diseases associated with essential hypertension. Conclusions: Essential hypertension was 
strongly associated with NIDDM and cerebral infarction. This study demonstrated the practicality of ARM in co-morbidity 
studies using a large clinic database. 
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I. Introduction 

Cardiovascular and cerebrovascular diseases, along with 
cancer, are the three major causes of deaths. The mortality 
rate of diseases of the circulatory system is 117.2 per 10,000. 
Among diseases of the circulatory system, the mortality rate 
per 10,000 is in the following order: cerebrovascular dis- 
ease (59.6), cardiovascular disease (43.7), and hypertensive 
disease (11.0) [1]. Moreover, hypertension has the highest 
prevalence among diseases of the circulatory system. How- 
ever, over one-half of patients with hypertension are not 
aware of their disease, and even if they are diagnosed with 
hypertension, they are not compliant with the recommended 
management. Indeed, after being diagnosed with hyperten- 
sion, approximately 20% of patients with hypertension con- 
tinue with the recommended treatment as prescribed, and 
over 65% of patients discontinue treatment against medical 
advice [2-4]. Hypertension alone is not important, unlike co- 
morbidities, such as stroke, myocardial infarction, conges- 
tive heart failure, and peripheral vascular disease. Of greatest 
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Figure 1. The analysis process. 



importance, hypertension contributes to the occurrence of 
cerebrovascular disease (35%) and ischemic heart disease 
(21%) [5]. 

Various conventional studies have shown that hyperten- 
sion is related to other diseases, such as cerebrovascular and 
cardiovascular diseases. However, studies demonstrating an 
association among co-morbidities of hypertension have not 
been proposed. Therefore, in this study, we determined the 
relationship among co-morbidities of hypertension based on 
association rule mining (ARM). ARM is a powerful method 
to analyze the association among tree and more 3 co-mor- 
bidities for the following the reasons: 1) ARM can manage 
the relationship of several items, and 2) the confidence value 
can be used in arithmetic operations [6]. 

II. Methods 

1 . Subject of Investigation 

In this study, the data of inpatients over 18 years of age with 
essential hypertension at A hospital in D city was used. The 
period of data collection was from May 2005 to December 
2007 using electronic medical records. 

2. The Process of Study and Data Collection 

The process based on ARM to analyze patients diagnosed 
with essential hypertension is shown in Figure 1. We col- 
lected diagnostic data of patients with essential hypertension 
which were classified into 110 according to International 
Classification of Disease (ICD) and Korea Classification of 
Disease (KCD) from the data warehouse (D/W). The per- 
sonal information, such as name, resident registration num- 
ber, and telephone number were removed from the data. 

3. Constructing Data Mart for Patients with Essential 
Hypertension 

A total of 5,022 patients were diagnosed with essential hy- 
pertension and the total diagnostic data numbered 53,994. 
Moreover, high support for the disease occurred if a patient 
was diagnosed with the same disease several times. There- 
fore, we have removed duplicated data by comparing the 



registration number and diagnosis code. Diagnoses related 
to external factors, such as injury, poisoning, certain other 
consequences of external causes (SOO-T98), external causes 
of morbidity and mortality (V01-Y98), factors influencing 
health status and contact with health services (Z00-Z99), 
and codes for special purposes (U00-U99) have been re- 
moved from the data mart. Data mart with 26,823 cases was 
constructed and used for correlation analysis. 

4. Analysis Method 

The statistical analysis program, SPSS Clementine 12.0 (SPSS 
Inc., Chicago, IL, USA), was used. Frequency analysis was 
performed on gender, age, and other diseases of the patients 
with hypertension. Moreover, Apriori modeling and web 
node were performed to analyze the strengths of associations 
among hypertension and other diseases. 

Web node is a visualization tool to represent the relation- 
ship between items, and Apriori modeling is a modeling 
method of ARM that makes it possible to apply binominal or 
multi-nominal data types. ARM is used to analyze the ten- 
dency of how often item A and item B occur together. Then 
the support is defined as the percentage of transactions that 
contains diagnosis case 1 (Dxl) and diagnosis case 2 (Dx2), 
and may be regarded as P (Dxl U Dx2) which is direction- 
independent. The confidence is defined as the ratio of the 
support of the item set (Dxl U Dx2) to the support of the 
item set, Dxl, which roughly corresponds to the conditional 
probability, P (Dxl |Dx2), and is direction-dependent. In 
terms of epidemiology, the support resembles the preva- 
lence rate of Dxl and Dx2 within a certain period of time. 
The confidence, ratio of the co-occurrence rate of Dxl and 
Dx2 over the prevalence of Dxl, resembles the co-morbidity 
of Dx2 with Dxl within the same period of time, but is 
direction-dependent. As a result of the Apriori modeling, 
association rules are evaluated on the values of support and 
confidence [6-9]: 

Number of disease AHB 

Support (%) = 

Total number of disease 
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III. Results 

1. Patient Gender and Age Distribution 

The data consisted of 2,508 males (49.94%) and 2,514 fe- 
males (50.06%) for a total 5,022 patients. Moreover, in the 
age distribution, the patients over 70 years of age were the 
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Figure 2. The distribution of patients according to age. 



most frequent (1,882; 37.48%), and the patients between 
over 18 years and less 29 years of age were the least frequent 
(35; 0.70%), as shown in Figure 2. The mean and standard 
deviation was 65 ± 1 1 years of age. 

2. Distribution of Other Diseases in Patients with Essen- 
tial Hypertension 

The frequency of other diseases in patients with essential 
hypertension is shown in Table 1. Non-insulin-dependent 
diabetes mellitus (Ell) was the most frequent disease (1,765 
patients), and cerebral infarction (163), angina pectoris (120), 
and chronic renal failure (N18) showed a high frequency in 
that order. In the case of distribution of diseases according 
to gender, non-insulin-dependent diabetes mellitus was the 
most frequent disease, and cerebral infarction and angina 
pectoris showed a high frequency as well. In case of males, 
acute myocardial infarction (121) and gastric ulcer (K25) 
had a statistically significant difference (p < 0.05), although 
cerebral infarction (163), angina pectoris (120), chronic re- 
nal failure (N18), acute myocardial infarction (121), gastric 
ulcer (K25), and prostatic hyperplasia (N40) had a higher 
frequency than females. In the case of females, gastritis and 
duodenitis (K29), heart failure (150), and osteoporosis with- 
out pathologic fractures (M81) had a statistically significant 
difference (p < 0.05); non-insulin-dependent diabetes mel- 
litus (Ell), gastritis and duodenitis (K29), disorders of lipo- 
protein metabolism and other lipidaemias (E78), hemiplegia 
(G81), heart failure (150), and osteoporosis without patho- 
logic fractures (M81) showed a higher frequency than males. 



Table 1. The distribution of other diseases in the patients with hypertension 



Dx code 


Disease 


Male 


Female 


Total 


2 

X 


p-value 


Ell 


Non-insulin-dependent diabetes mellitus 


870 


895 


1,765 


0.354 


0.552 


163 


Cerebral infarction 


562 


503 


1,065 


3.269 


0.071 


120 


Angina pectoris 


390 


340 


730 


3.425 


0.064 


N18 


Chronic renal failure 


269 


241 


510 


1.537 


0.215 


K29 


Gastritis and duodenitis 


195 


268 


463 


11.510 


0.001 


121 


Acute myocardial infarction 


209 


158 


367 


7.087 


0.008 


E78 


Disorders of lipoprotein metabolism and other lipidaemias 


168 


191 


359 


1.474 


0.225 


K25 


Gastric ulcer 


198 


155 


353 


5.238 


0.022 


G81 


Hemiplegia 


143 


151 


294 


0.218 


0.641 


150 


Heart failure 


101 


181 


282 


22.695 


0.000 


K21 


Gastro-oesophageal reflux disease 


131 


138 


269 


0.182 


0.670 


N40 


Hyperplasia of prostate 


167 




167 






M81 


Osteoporosis without pathological fracture 


51 


153 


204 


51.000 


0.000 
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Figure 3. Association graph by using web node. Ell: non-insu- 
lin-dependent diabetes mellitus, E87: other disorders 
of fluid, electrolytes and acid-base balance, 110: es- 
sential hypertension, 121: acute myocardial infarction, 
150: heart failure, 163: cerebral infarction, K21 : gastro- 
oesophageal reflux disease, N18: chronic renal failure. 



Table 2. The result of a priori modeling application 



Antecedent 


Consequent 


Support (°/o) 


Confidence (°/o) 


Lift 


Ell 


110 


35.15 


100.00 


1.000 


110 


Ell 


35.15 


35.15 


1.000 


163 


110 


21.21 


100.00 


1.000 


110 


163 


21.21 


21.21 


1.000 


110, 163 


Ell 


7.91 


37.31 


1.062 


110, Ell 


163 


7.91 


22.49 


1.061 


110, Ell 


120 


5.54 


15.75 


1.079 


110, Ell 


N18 


5.52 


15.69 


1.545 



Ell: non-insulin-dependent diabetes mellitus, 110: essential hy- 
pertension, 163: cerebral infarction, 120: angina pectoris, N18: 
chronic renal failure. 



3. Result Visualization by Web Node 

Figure 3 shows the results of the relationship among essen- 
tial hypertension and high frequency diseases listed in Table 
1 using web node. Co-morbid diseases were linked with 
each other. From the results shown in Figure 3, essential hy- 
pertension was linked with non-insulin-dependent diabetes 
mellitus and cerebral infarction, and non-insulin-dependent 
diabetes mellitus was linked with cerebral infarction. There- 



fore, it was shown that non-insulin-dependent diabetes 
mellitus and cerebral infarction have a relationship with es- 
sential hypertension. Other diseases, such as disorders of li- 
poprotein metabolism and other lipidaemias (E78) and acute 
myocardial infarction (121), did not have a relationship with 
essential hypertension. 

4. Results of ARM Using the Apriori Modeling 

Based on the results of the Apriori modeling, the association 
rules among essential hypertension and specific diseases are 
shown in Table 2. We extracted 8 association rules and the 
used threshold values were as follows: support, >5%; and 
confidence, >15%. The rule with the highest support and 
confidence was 'non-insulin-dependent diabetes mellitus to 
essential hypertension', which had confidence and support 
values of 100% and 35.15%, respectively. The second rule 
was 'cerebral infarction to essential hypertension', which 
had confidence and support values of 100% and 21.19%, 
respectively. The third rule was 'essential hypertension and 
cerebral infarction to non-insulin-dependent diabetes melli- 
tus', which had confidence and support values of 37.31% and 
7.91%, respectively. The rule for 'essential hypertension and 
non-insulin-dependent diabetes mellitus to cerebral infarc- 
tion' had confidence and support values of 22.49% and 7.91%, 
respectively. The other rules for 'essential hypertension and 
non-insulin-dependent diabetes mellitus to angina pectoris' 
and 'essential hypertension and non-insulin-dependent dia- 
betes mellitus to chronic renal failure' had a confidence less 
than 20%. 

IV. Discussion 

This study aimed to analyze the association among essential 
hypertension and other diseases using the Apriori modeling, 
which is a popular and powerful method in data mining [10]. 
In this study, we used 53,994 diagnoses data extracted from 
the D/W accumulated based on electronic medical records. 
Therefore, using the D/W was possible to analyze massive 
data, which was different from an epidemiologic study by re- 
viewing paper-based medical records or a prospective study 
[11]. Moreover, this study was meaningful to analyze the 
association among essential hypertension and various co- 
morbid diseases. 
Hypertension is known as a risk factor for diabetes mellitus, 
cardiovascular disease, and cerebrovascular disease. In this 
study, the results based on web node showed that essential 
hypertension, non-insulin-dependent diabetes mellitus, and 
cerebral infarction have a relationship with each other. Based 
on the results of the Apriori modeling, the association rule 
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for 'essential hypertension to non-insulin-dependent diabe- 
tes mellitus' had the highest confidence and support. Thus, 
essential hypertension and non-insulin-dependent diabetes 
mellitus were associated with one another. Lee and Park [12] 
stated that 39% of first-diagnosed diabetes mellitus patients 
had co-morbid hypertension. The patients with hypertension 
had a 2.5-fold higher prevalence than people with normal 
blood pressure, and hypertension occurred 3-fold higher in 
patients with diabetes mellitus. Therefore, the patients with 
either hypertension or diabetes mellitus need to care for both 
blood pressure and blood glucose together because the co- 
morbidity of hypertension and diabetes mellitus could be the 
basis for the increased clinical attack rate of cardiovascular 
disease and cerebral infarction, myocardial infarction, heart 
failure, and renal failure [13]. In another study [12] it was re- 
ported that >80% of patients with diabetic microangiopathy 
or diabetic nephropathy had hypertension as a co-morbidity, 
and patients with hypertension had a 2 -fold higher attack 
rate for coronary artery disease and a 2-6-fold higher attack 
rate for cerebrovascular disease than non- diabetics of the 
same age group [12]. In this study, we investigated the rela- 
tionship between essential hypertension, non-insulin-depen- 
dent diabetes, and other diseases based on ARM. Based on 
the results, we showed that essential hypertension and non- 
insulin-dependent diabetes influenced co-morbid cerebral 
infarction, angina pectoris, and chronic renal failure. 

We have applied ARM to a large electron medical record 
data base of patients with hypertension to analyze the asso- 
ciation with co-morbid diseases. However, the data that we 
used in this study were the inpatients' clinical records of the 
one hospital located in D city. Moreover, it was difficult to 
analyze sequential patterns because patients were diagnosed 
with several diseases at the same time in some cases. There- 
fore, studies based on data collected from various hospitals 
to find out general and sequential rules will be the subject of 
further studies. 
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